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■ Abstract. We present a numerical analysis of the entropy rate and statistical 
\ complexity related to the spin flip dynamics of the 2D Ising Ferromagnet at different 

temperatures T. We follow an information theoretic approach and test three different 
entropy estimation algorithms to asses entropy rate and statistical complexity of binary 
sequences. The latter are obtained by monitoring the orientation of a single spin on a 
'"^ ■ square lattice of side-length L = 256 at a given temperature parameter over time. The 

Ch , different entropy estimation procedures are based on the Af -block Shannon entropy (a 

well established method that yields results for benchmarking purposes), non-sequential 
recursive pair substitution (providing an elaborate and an approximate estimator) 
, I and a convenient data compression algorithm contained in the zlib-library (providing 

^ ■ an approximate estimator only). We propose an approximate measure of statistical 

. complexity that emphasizes on correlations within the sequence and which is easy to 

I implement, even by means of black-box data compression algorithms. Regarding the 

■ 2D Ising Ferromagnet simulated using Metropolis dynamics and for binary sequences 
, of finite length, the proposed approximate complexity measure is peaked close to the 

■ critical temperature. For the approximate estimators, a finite-size scaling analysis 

■ reveals that the peak approaches the critical temperature as the sequence length 
" ^ 1 1 increases. Results obtained using different spin-flip dynamics are briefly discussed. 

^ ■ The suggested complexity measure can be extended to non-binary sequences in a 

, straightforward manner. 
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1. Introduction 

The basic task of data compression algorithms is to discover patterns (synonymous 
with regularities, correlations, symmetries and structure; see Sect. II of Ref. [1]), and 
to remove the respective redundancies from supplied input data in order to minimize 
the space required to store the data. Interestingly, the pattern discovery and data 
compression process of particular data compression schemes finds application in contexts 
as diverse as as e.g. DNA sequence classification [2], entropy estimation [31 HI |5], and, 
more generally, time series analysis [6]. 

Correspondingly, in the analysis of complex systems, one wants to find a measure for 
the information-theoretic "complexity" of a system [7]. The most simple measure is the 
entropy [8], but this is maximal for a purely random system. This contradicts the basic 
idea of complexity, which involves some structure, but not just regular structure. On the 
other hand, other measures of complexity, like a (minimal) algorithm/computer /circuit 
able of generating (an instance of) the problem P, [101 [H], are often impractical when 
it comes to the analysis of given large systems. Hence, data compression algorithms 
are a natural and in particular simple candidate for detecting complexity [U [12] . Note 
that also other practically computable approaches exist, which indeed seem to measure 
complexity as expected, like mutual information [131 Ej or statistical complexity [T3]. 

As a first step, different proposed quantities are usually applied to simple toy 
systems, e.g. models exhibiting only few states [HI HI [151 [12] • In statistical mechanics 
on the other hand, one studies models which involve many degrees of freedom with 
non-trivial interactions. Such models are regarded as being very complex often right 
at phase transitions [16]. This large degree of complexity is from the physical point of 
view visible via growing correlations in the system. An information-theoretic analysis 
of the complexity of such models has to our knowledge been considered so far only in 
few studies and only by example [17] . The aim of this work is to study extensively 
data compression algorithms for time series generated by the Ising model, which is one 
of the most fundamental and important models of statistical mechanics exhibiting a 
phase transition. In particular, we want to find out whether the phase transition can 
be detected, located and analyzed numerically with high precision just by looking at 
complexity measures derived from symbol substitution methods which can be used for 
data compression. 

By means of these methods, we attempt to estimate the entropy rate of symbolic 
sequences and we compute a measure to account for correlations that possibly 
characterize the sequence. The latter observable, here referred to as approximate 
complexity, is related to the excess entropy which characterizes the statistical complexity 
of the sequence and it can very well be understood in terms of information theory. 
Further, it can easily be computed by means of black-box data compression algorithms, 
as, e.g., the compress algorithm contained in the zlib-library [18]. While the entropy 
accounts for the randomness contained in the sequence, the approximate complexity is 
sensitive to correlations. 



Numerical entropy estimation and complexity - 2D Ising Ferromagnet 



3 



The symbol substitution method considered in the bulk of the presented article 
is a particular dictionary based data compression scheme that operates by a non- 
sequential recursive pair substitution process and is hence referred to as NSRPS. The 
basic routine of the NSRPS algorithm is a pair substitution step that amounts to 
replace the most frequent two-symbol-subsequence by a new symbol that is shorter 
in length. If this process is performed in repeated manner, it is possible to achieve a 
compression of the input data. Such pair substitution methods, intending to quantify 
the degree of "patterness" of symbolic sequences, where introduced several decades ago 
by Ebeling and Jimenez-Montano, see Ref. [19]. In Ref. [20], a sequence compressing 
algorithm based on the NSRPS paradigm was introduced and used to estimate the 
information content of binary sequences. Rigorous results on the NSRPS method, 
presented by Benedetto, Caglioti, and Gabrielle in Ref. [21], imply that for sufficiently 
large sequences the NSRPS method can be utilized to estimate the entropy rate of an 
ergodic process. On this basis, Calcagnile, Galatolo, and Menconi recently reported on 
numerical experiments, see Ref. [22], that compared entropy estimates arising from the 
NSRPS method and other well established methods considering different maps and a 
stationary process known as "renewal process". The authors found that the NSRPS 
method provided the best approximation to the respective entropy values. 

Most recently, a NSRPS based randomness measure for symbolic sequences was 
introduced and tested for short sequences [12]. In the latter study, the sequences 
where obtained from iterating the logistic map at different bifurcation parameters and 
applying a proper discretization procedure. Therein, the NSRPS based randomness 
measure appeared to be strongly correlated to the Lyapunov exponent of the map and 
hence could be used to quantify whether a particular sequence appears to have a simple 
or a random structure (note that the authors of Ref. [22] refer to it as "complexity 
measure" . More precise, this type of complexity is termed deterministic complexity [23] 
and it merely measures the randomness associated to a symbolic sequence. As regards 
this notion of deterministic complexity, the randomness measure presented in [12] is 
simply proportional to the entropy rate.). Albeit the NSRPS method is of academic 
interest (as documented by the references above), it is commonly not used in practice. 
More frequently used dictionary compression algorithms are based on Lempel-Ziv (LZ) 
coding [21]. E.g., the compress data-compression tool available in the zlib-library 
uses LZ77 compression, a particular variant of LZ coding. Effectively, using LZ coding, 
subsequences of the input sequence are replaced by pointers that specify positions within 
the sequence where the respective subsequences have occurred earlier [2H[25]. This latter 
approach to compression is slightly different from the iterated pair substitution on which 
the NSRPS method is build upon. The entropy rate and approximate complexity of a 
sequence can also be approximated by means of the NSRPS based randomness measure 
and by using the compress-algorithm. Both methods, however, allow only to compute 
an upper bound and are not as precise as the more elaborate estimates. 

In this work, we aim to assess how well the NSRPS based entropy estimation method 
described by Ref. [22] performs on binary sequences that represent the spin-flip dynamics 
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of the 2D Ising Ferromagnet (FM) at different temperatures T. For this purpose we 
consider single-spin-flip Metropolis dynamics (main part of the presented article) as 
well as spin-flip dynamics induced by the Wolff cluster algorithm (results reported in 
subsection 14.41) . The input data to be analyzed is given by sequences = (si, . . . , stv) of 
length iV, consisting of symbols Sj over the binary alphabet ^ = {0, 1}. These sequences 
are obtained by monitoring the time-series related to the orientation of a single spin, 
located on a square lattice of side length L with fully periodic boundary conditions. 
In order to allow for a comparison of the results, we also estimate the entropy rate 
and complexity of the binary sequences by a well established approach based on the 
Shannon entropy, as presented in Ref. [SB]. Previously, the latter approach led to the 
analysis of complexity-entropy diagrams that allow for a characterization of the temporal 
and spatial dynamics of various stochastic processes, including simple maps as well as 
Ising spin-systems, in purely information-theoretic coordinates [23]. We find that for the 
whole range of temperatures considered, the entropy rates and approximate complexities 
estimated via the elaborate NSRPS algorithm of Ref. [22] are in good agreement with 
the Shannon entropy based estimates following Ref. [26|. Furthermore, we find that 
the approximate complexity is peaked at a sequence- length dependent, effective critical 
temperature. A finite-size scaling analysis in the sequence length (where the size of the 
Ising model that supplies the binary input sequences is fixed to 256 x 256 spins) reveals 
that in the limit of infinitely long sequences, the peak is located close by the critical 
temperature T^. k, 2.269 of the 2D Ising Ferromagnet. 

The remainder of the article is organized as follows. In Sect. [2] we introduce the 
well-established information theoretic notation, the entropy rate and the approximate 
complexity. For illustration and comparison, we also include the results of these 
observables for the 2D Ising FM as function of temperature. In Sect. [3] we discuss the 
symbolic substitution method and the three different entropy estimation procedures. 
The main part of this work are the results for theses procedures and an assess of the 
performance, as presented in Sect. HI Finally, Sect. [5] concludes with a brief summary. 
A more elaborate summary of the presented article is available at the papercore database 

m- 

2. Basic notation from information theory and pattern discovery 

In subsection 12.11 we introduce basic notation from information theory, needed to 
motivate the observables that are considered in the remainder of the article. First 
and foremost, these are the entropy rate and excess entropy that might be associated 
to a sequence of symbols. If not stated explicitly, we thereby follow the notation used 
by Shalizi and Crutchfield [1] , and Crutchfield and Feldman [26] . In subsection 12.21 we 
then discuss entropy rate and excess entropy as well as the convergence properties of 
the entropy rate by considering data obtained for the 2D Ising Ferromagnet at different 
temperatures including the critical point. Note that the results for the entropy rate and 
excess entropy are similar to those presented by Arnold in Ref. [IT], hence, they are 
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included here for illustration and comparison only. On the other hand, the convergence 
of the entropy rate, which in turn leads to an easy to compute approximate measure 
of complexity, has to our knowledge not been discussed yet. Finally, in subsection 12.31 
we motivate an "approximate complexity" that quantifies for which parameters a given 
model exhibits a small or large statistical complexity. 

2.1. Block entropy, entropy rate convergence and complexity 

As pointed out in the introduction, the presented article addresses two issues: numerical 
estimation of the entropy rate associated to a sequence of symbols, and providing an 
approximate measure of statistical complexity that effectively accounts for the rapidity 
of entropy convergence. A prerequisite needed to define the subsequent observables is 
the Shannon entropy related to blocks of M consecutive variables in a length N sequence 
S_ = (si, $2, . ■ . , s^). This quantity is defined as 



and is henceforth referred to as M-block Shannon entropy the sum runs over all possible 
strings s^^ of M symbols (M > 1) from the alphabet A, and where Pr(s^'^) denotes the 
probability (i.e. the empirical rate of occurrence) of in the given sequence (where we 
agree to set log2(0) = 0). For a schematic plot of H{M), see Fig.[TJ In general, H{M) is 
a nondecreasing function of M bounded by H{M) < M-log2 |^|, which is obtained if the 
probability of a string factorizes and each letter has the same probability of occurrence, 
i.e., Pr(s^'^) = l/l^l*'^. In the limit of large block-sizes, H{M) might not converge 
to a finite value. As a remedy, due to the above bounding value, the entropy rate 
is considered instead. The entropy rate (also termed per-symbol entropy) specifies the 
asymptotic rate of increase of the M-block Shannon entropy regarding the block length, 
i.e. 




(1) 




lim —H(M). 



(2) 
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For a given sequence it quantifies the randomness that remains after patterns on 
subsequences of increasing length are taken into account. In order to quantify its 
convergence properties, it is useful to consider finite-M approximations to the entropy 
rate. By considering the entropy gain AH{M) = H{M) — H{M — 1) (with -ff(O) = 0) 
it is possible to show that /i^ = ImiM^oo AH{M) (see Sect. IIIB of Ref. [26]). Thus, 
one also uses the term apparent entropy rate 

hf,{M) = AH{M) = H{M) - H{M -1). (3) 

Alternatively one might refer to Eq. ([2]) and define 

h'^iM) ^ H{M)/M, (4) 

where both definitions (Eqs. ([3]) and (jl])) are restricted to M > 0. Asymptotically 
it holds that YrniM^ooh'^i^M) = \imM^ooh^{M), but h'^{M) typically converges slower 
than hfj,{M), see Ref. [28]. A measure that quantifies how much the entropy rate at block 
length M exceeds the actual entropy rate is given by the per-symbol M redundancy 

r(M) = h^{M) - h^. (5) 

A value r(M) > indicates that upon considering blocks of length M, the asymptotic 
entropy rate of the sequence is overestimated. This overestimation is due to redundant 
information, i.e. patterns, that characterizes the sequence. Summing up all per-symbol 
M redundancies yields the excess Entropy, which might also be referred to as (statistical) 
complexity (see Eqs. (2)-(4) of Ref. [T7j): 



oo 



C,^ 5^r(M) = ^[/^.(M)-/^,]. (6) 

M=l M=l 

In order to compare Eq. ([6]) above to Eqs. (3) and (4) of Ref. [17], note that the sum 
in the former equation can be interpreted as an integration of the discrete function 
hfj_{M) — = AH{M) — hfj^. Now, bearing in mind that the entropy gain AH{M) 
signifies the discrete derivative of the entropy itself, directly leads to 

C^= lim [H{M)-Mh^]. (7) 

Af— s>oo 

For large blocksize this implies the scaling H{M) ~ + Mh^, allowing for a 
geometric interpretation of entropy rate and complexity: the asymptotic straight line 
approximation to H{M) gives rise to the randomness, while the intercept with the y-axis 
is equal to the complexity, see right of Fig. [H 



2.2. Results for the 2D Ising Ferromagnet 

As pointed out in the introduction, the sequences that are under scrutiny here are 
obtained by simulating a 2D Ising FM on a regular square lattice with side-length L, 
using Metropolis dynamics [29] at a selected temperature T G [2, 2.8]. For the purpose 
of modeling the binary sequences, a particular spin on the lattice is chosen as a "source" , 
emitting symbols from the binary alphabet A = {0, 1} (after a simple transformation 
of the spin variables). Therefore, the orientation of the source-spin is monitored during 
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Figure 2. Scaling properties of the apparent entropy rates for three exemplary 
temperatures located below, close to, and above the critical temperature. For all 
quantities an average (. . .) over independent sequences of length N = 10^ is computed, 
(a) the estimator (Af ) for the apparent entropy rate converges much faster than the 
estimator h'^{M). (b) the associated per-symbol M redundancies h^{M) — /i^ decay 
faster than oc 1/M (for M large enough this holds also for h'^{M) — /i^, see inset). The 
dashed line indicates a scaling of cx \/M. 



a number of Monte Carlo (MC) sweeps to yield a particular length sequence. 
Before the spin orientation is recorded, a sufficient number of sweeps are performed to 
ensure that the system is equilibrated. In this regard, for a square lattice with side 
length L = 128, and by analyzing the magnetization of the system, we observed an 
equilibration time of approximately Teq = 3000 MC sweeps for the lowest temperature. 
However, for each system considered we discarded the first 10^ sweeps to avoid initial 
transients. 

In the numerical experiments, the M-block Shannon entropy can only be computed 
for block-sizes smaller than some maximal size Mmax- Otherwise, for an input sequence 
of finite length, the (true) distribution of possible configurations associated to blocks of 
M consecutive variables will be approximated poorly (see Refs. [281 EQ]). Effectively, 
this provides only an upper bound which might nevertheless yield a reasonable 
approximation to the actual entropy of the considered sequences. Further, the sum 
in Eq. ([H]) needs to be truncated to a finite number of terms, implying that only a lower 
bound on the excess entropy can be computed. The quality of the lower bound is due to 
the rapidity of the convergence of the apparent entropy rate h^{M). In this regard. Fig. 
[2|^a) shows the convergence for the two estimators hf^{M) and h'^{M) of the apparent 
entropy rate as function of the block length M for three exemplary temperatures located 
below, close to, and above the critical temperature. The input sequences had a length 
of = 10^. As evident from the figure, {h^{M)) converges substantially faster than 
{h'^{M)). The brackets (...) indicate an average over independent sequences. For 
comparison, at T = 2.267 (i.e. close to the critical point) a fit to the functional form 
{h'^{M)) = h'^ + a/M for the block-size interval M G [5,10] yields the parameters 
h'^ = 0.4388(6) and a = 0.290(1), the reduced chi-square being xVdof = 0.09. Note 
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Figure 3. Results for the average entropy rate /i^ and the average complexity (C^) 
for binary sequences that describe the spin-flip dynamics for the 2D Ising Ferromagnet 
at different temperatures T considering sequences of length N = 10^. As explained in 
the text, the maximally feasible block-size for the computation of the observables is set 
to M = 10. (a) convergence of the apparent entropy rates (/i^(M)) to the asymptotic 
entropy rate (/i^) For different block sizes M = 1, • • • , 10. The results for M > 3 fall 
almost on top of each other. Note that (/i^) only gives an upper bound on the actual 
entropy rate, (b) average complexity (C^) and contribution of the different per-symbol 
M redundancies (r(M)). 

that the fit function above describes the data quite well, however, below we argue 
that the per-symbol M redundancies decay somewhat faster than oc 1/M as the choice 
of the particular scaling function suggests. For comparison, at M = 10 we find 
{h^iVS)) = 0.436(1). For temperatures away from the critical point the estimates {h'^) 
and (/i^(lO)) agree even better. Hence, in order to estimate the entropy rate /i^ and 
complexity we here consider the maximally feasible blocksize to be M^ax = 10. 
Consequently, the (average) asymptotic entropy rate is set to (/i^) = (/i^(lO)). Again, 
note that this only provides an upper bound to the true entropy rate. The difference 
to the latter is due to long-range correlations in the sequences that are missed by 
restricting the analysis to blocks of maximal length Mmax = 10. Fig. EJ^b) shows 
the scaling properties of the per-symbol M redundancies {r{M)) for the particular 
choice Mmax = 10, defined in Eq. ([5]). As evident from the main plot of the figure, 
the redundancy {h^{M)) — (h^) decays faster than oc 1/M. For large enough block- 
size this holds also for the alternative definition {h'^{M)) — {h^), see inset of Fig. |2]^b). 
Such a scaling behavior is characteristic for finitary processes, i.e. processes with a finite 
complexity C^. 

Finally, results for the numerical estimation of the entropy rate and the complexity 
considering Mmax = 10 as a function of temperature T are illustrated in Figs, 
and (b), respectively. The estimate of the average entropy rate (/i/^), computed as 
explained above and shown in Fig. |3]^a), will subsequently serve as a benchmark to 
which estimates that rely on the methods described in Sect. [3]will be compared to. The 
numerical derivative of the entropy rate with respect to the temperature indicates that 
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right at the critical temperature, the increase of {h^) is strongest (not shown). Further, 
the fluctuations = (^^) ~ (^m)^ exhibit an accentuated peak at T^. In the high- 
temperature paramagnetic phase, i.e. for all temperatures T > Tc, the single-symbol 
entropy rate assumes its extremal value H{1) = logad^l) = 1. Also note that the 
complexity shown in Fig. Et^b) has an isolated peak close to the critical temperature. 
As evident from the latter figure, all terms in the sum of Eq. ([6]) (i.e. the individual 
per-symbol M redundancies) display a similar scaling behavior. 

Again, note that part of these illustrating results on the one-dimensional symbolic 
sequences reported above are qualitatively similar to those reported earlier in Ref. [T7] . 
Conceptually similar analyses carried out on two-dimensional configurations of spins 
obtained from a simulation of the 2D Ising FM, reported in Ref. |23], conclude that 
the excess entropy is peaked at a temperature Tc ~ 2.42 in the paramagnetic phase 
slightly above the true critical temperature. Similar results on the mutual information 
(which is equivalent to the excess entropy; see Ref. [26]) for the 2D Ising FM (and 
more general classical 2D spin models) where recently presented in Ref. [31]. Therein, 
the authors conclude that the mutual information reaches a maximum in the high- 
temperature paramagnetic phase close to the system parameter K = J/ksT ^ 0.41 (for 
J = ks = 1 this corresponds to T ^ 2.44). Our new results and analyses, which go 
beyond the cited literature are presented in our main result part Sec. HI 

2. 3. An approximate measure of statistical complexity 

Regarding the complexity, the convergence properties of the per-symbol M redundancies 
as a function of the block-size, displayed in Fig. Mjo), suggest that r(l) constitutes 
the dominant contribution to the sum in Eq. ([6]). As evident from the figure, r(2) is 
approximately one order of magnitude smaller than r(l). In tandem with the observation 
that, pictured as a function of temperature, r(l) already has the shape characteristic 
for (see Fig. Mh)) leads us to suggest r(l) as an approximate estimator that might 
tell under which circumstances a given model exhibits a larger or smaller complexity. 
Using the fact that h^{l) > h^, we here define the approximate complexity G [0, 1] as 

c,^ril)/h,il) = l-h,/h,{l). (8) 

By definition, it is related to the complexity that quantifies the convergence 
properties of the entropy rate. Appealing to the definition of the per-symbol M 
redundancies, quantifies the amount by which the entropy rate on the single-symbol 
level exceeds the asymptotic entropy rate. To support intuition on the gross behavior of 
the approximate complexity note that the larger the correlations between the symbols 
in a given sequence, the more patterns are missed by considering hf^{l) in comparison 
to h^, and the larger the numerical value of appears. In the two limits of completely 
ordered and fully random symbol sequences, assumes a value of zero, respectively. 
Note that Eq. ([8]) is conceptually similar to the multi-information (given by the entropy 
rate difference between an elementary subsystem, i.e. a single spin, and the infinite 
system) introduced by Erb and Ay in Ref. [32]. There, the authors considered the 
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multi-information to characterize spatial spin-configuration for the 2D Ising FM in the 
thermodynamic limit by analytic means. Among other things, the authors conclude 
that the multi-information exhibits an isolated global maximum right at the critical 
temperature (see Theorem 3.3 of Ref. [32]). 

The benefit of the approximate complexity is that if there is a way to numerically 
estimate the entropy rate (either as explained above, or by one of the algorithms 
introduced below in Sect. |3]), there will immediately be a way to estimate also h^{l). 
The proposed measure of approximate complexity can even be computed by means of 
black-box data compression algorithms. To facilitate intuition on that issue, consider a 
sequence S_ of symbols that stems from the observation of a stochastic and ergodic 
process that possibly contains long-range correlations. Now, picture a black-box 
algorithm A[-] that upon postprocessing 5^ yields some estimate of the entropy rate, 
i.e. h^if* = A[S]. So as to pave the way towards an estimate of hlf\l), consider the 
following: for a process in which the values of the variables are independently and 
identically distributed, i.e. for an IID process, it holds that h''^^ = /i^*'^(l) = h^lf{2) = . . .. 
For an IID process the block entropy rate grows linearly with the blocksize, and the 
associated complexity is zero (see Sect. V.A. on IID processes in Ref. [26]). An IID 
sequence related to the observed sequence S_ is easily obtained as S^""' = ir[S], wherein 
7r[-] signifies the permutation operator. Applying the permutation operator to the 
observed sequence destroys all patterns and yields an IID sequence with the same 
symbol frequencies as contained in S_. In Ref. [33] this is referred to as "standard 
random shuffle". Consequently, an estimate of the single-symbol entropy rate using 
A[-] is provided by ^'^^(1) = A[vr[S;]]. Note that for the Ising FM we find that at 
temperatures above the critical point it holds that /i^(l) ~ 1 (see Fig. [Ht^a)), hence we 
find = 1 — /i^ for T > Tc. 

3. Pattern discovery by means of symbolic substitution methods 

As pointed out in the introduction, pair substitution methods like the NSRPS 
method, intending to quantify the degree of patterness of symbolic sequences, where 
introduced several decades ago by Ebeling and Jimenez-Montano, see Ref. [19]. The 
underlying elementary pair-substitution process is illustrated below in subsection 13.11 
An algorithmic procedure that uses the NSRPS method in order to provide an elaborate 
estimate of the entropy rate for a symbolic sequence is explained in subsection 13.21 Two 
further plans to approximately estimate the entropy rate are motivated in subsection 

m 

3.1. Non-sequential recursive pair substitution (NSRPS) 

In order to describe an elementary non-sequential recursive pair substitution process, 
consider a sequence 1, . . . , Sat), composed of symbols stemming from a finite 

m-symbol alphabet A = {ctij^o^- Bear in mind that here, the initial sequences are 
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Figure 4. Stopping criteria used to terminate the NSRPS procedure, (a) Considering 
sequences at different temperatures T, the entropy rate for sequence lengths N > 10'' 
converges to a plateau at « 5 — 20 pair substitution steps. In order to compute the 
entropy rate via Eq. ^ we thus fix iV = 10^ and consider a number of n = 20 
pair substitution steps, (b) For suflaciently large sequence length the frequency /mfp 
of the most frequent pair decreases with the number of pairs substitution steps as 
/mfp oc n~^''^^ . For other temperatures, the scaling behavior appears to be the same. 
The inset illustrates the scaling behavior of the entropy rate as a function of the 
frequency /mfp. The dashed vertical line corresponds to the stopping condition /™fp 
used by Ref. 

assembled of symbols sf'^ that stem from A = {0,1}. The fundamental routine of 
the NSRPS algorithm, referred to as pair substitution, might be illustrated as two step 
procedure: 

(i) For a given sequence 5^*-°^ , determine the frequency of all ordered pairs e E A x A 
and identify e^fp, signifying the most frequent (ordered) length-two subsequence of 
symbols. If the most frequent pair is not unique, signify one of them as Cmfp- 

(ii) Construct a new sequence S_^^^ from S_^^^ , wherein each full pattern emfp is replaced 
by a new symbol am- For this purpose, S_^^^ is scanned from left to right, and the 
existing alphabet A is augmented by am- 

This elementary pair substitution process might be executed iteratively. If one keeps 
track of the most frequent pairs of symbols and their substitutes at each iteration step, 
the initial sequence can be reconstructed any time. This offers the possibility to 
design lossless compression algorithms based on NSRPS. 

3.2. Numerical estimation of the entropy rate following Calcagnile et. al. 

Based on the rigorous results reported in Ref. [21], and further work by Ref. [22], 
an elaborate entropy estimation algorithm based on the NSRPS method can be 
implemented. Therein, the idea is that after a couple of pair substitutions, the most 
frequent blocks, signifying the most common patterns up to a certain length within 
the sequence, are condensed into single symbols. Thus, it should by possible to find a 
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good approximation to the actual entropy by considering the M-block entropy (block 
entropy=BE) for small blocksize M only. Following Ref. |22], and denoting a symbolic 
sequence after a number of n pair substitution steps as S_^"'\ a connection to the per- 
symbol entropy is available as 

^(NSRPS-BE)[^(0)] ^ 2)-if(^(")^l) 

where refers to the length of the initial sequence Sj^\ H{S,M) specifies the M-block 
entropy as estimated for the particular sequence 5^ and i{S_) stands for the length of the 
sequence 5; (note that £(^(°)) = A^). From a point of view of numerical experiments, 
note that for a finite value of A^, the number n of pair substitution steps is limited. 
If the value for n is chosen to be comparatively large and i{S_^^'^) gets rather small, 
the statistics for the M = 2, 1-block entropies might become poor. So as to cast Eq. 
([9]) into a working algorithm for a symbolic sequence of length A^, a proper stopping 
criterion for the iteration of the pair substitution step is required. Regarding that issue, 
the authors of Ref. |22] decided to stop the iteration of the pair substitution process 
as soon as the frequency of the most frequent pair gets smaller than /™fp = 0.02. We 
here follow a slightly different approach that nevertheless yields quite similar results. 
I.e., by considering sequences at different temperatures we monitored the evolution of 
the entropy rate as function of the number of pair substitution steps. Regardless of the 
temperature we found that for sufficiently long sequences (A^ > 10^) and after a number 
of approximately 5 — 30 pair substitution steps, the entropy rate converges to a plateau 
before it starts to decrease until = is reached. In Fig. ID^a) this is illustrated for 
three exemplary temperatures. Consequently, in order to assess the entropy rate via 
Eq. iQ (following the approach of Calcagnile et. ai), we here fix the sequence length to 
A^ = 10^ and perform a number of n = 25 pair substitution steps for all the sequences 
considered. Regarding the frequency /mfp of the most frequent pair we found that for 
sequences not too short (i.e. sequence lengths A^ > 10^), it decreases algebraically with 
the number of pair substitution steps as /mfp oc see Fig. \Mh)- The stopping 

criterion of Ref. [22] would thus correspond to n ^ 32. The inset of Fig. Ht^b) shows that 
for sequence lengths > 10^ the stopping condition /mfp = 0.02 yields an entropy value 
along a plateau immediately before the value of starts to decrease. 



3. 3. Approximation of the entropy rate using further symbol substitution techniques 

The following subsection is based on the observation that data compression methods 
allow to distinguish between regular and random sequences in the following sense: 
A sequence that contains patterns (possibly on many scales) is not random but is 
compressible by means of symbol substitution methods. Therein, a sequence S_ is 
considered to be random if there exists no shorter sequence (written on the same 
alphabet as S_) that allows to construct S_. This further implies that the less patterns a 
sequence exhibits, the less compressible it is. In terms of the sequences obtained for the 
2D Ising Ferromagnet it is thus intuitive that a sequence recorded at T «i is highly 
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compressible, whereas a sequence recorded at T — t- oo cannot be compressed much. The 
sequences are called algorithmically simple and algorithmically random, respectively. 

Now, consider an algorithm M[-] that returns some quantity describing how 
compressible a given sequence appears to be. In this regard, the notion of algorithmically 
simple (random) shall translate to a small (large) value returned by M[-]. Further, 
consider a length sequence recorded at finite temperature T as well as a whole set of 
length N sequences recorded at T = oo. A measure that might be used to approximate 
the entropy rate is given by the algorithmic entropy (AE), here defined as 

- will- (^°) 

The value of {M[S_{oo)]) is used to normalize the observable so that as T — > (T — j- oo) 
algorithmically simple (random) corresponds to {h\^^^) — )■ {{h^i^^^) !)• Therein, 
the brackets (■) denote an average over different sequences. Note that, by means of 
compression based estimators, it is possible to compute upper bounds to the true per- 
symbol entropy, only. However, the aim of the presented subsection is not to provide 
competitive estimators in comparison to those presented earlier in subsects. 12.21 (BE) and 
13.21 (NSRPS-BE), but to prepare easy to compute approximations to the approximate 
complexity (as reported later in subsects. 14.11 and |42|) . 



Lempel-Ziv coding: If we consider data compression algorithms based on Lempel-Ziv 
coding, as, e.g., the compress algorithm contained in the zlib-library [18], where M[-] 
returns the length of the compressed sequence, i.e. M[S_(T)] = £(compress[iS(T)]), then 
h^ff^^^ effectively corresponds to the algorithmic entropy according to Lempel and Ziv as 
used by Ref. [3] and we define 

^ (£(compress[6.(oo)J)) 

Note that Eq. fITT]) represents a fully data-compression based measure for the entropy 
rate. 



NSRPS based symbol substitution: Recently, Nagaraj et. al. detailed a method to 
measure the degree of randomness for symbolic sequences [12]. The idea behind their 
measure is as follows: for a given sequence S_ the pair substitution process might be 
iterated until the representation of the sequence requires a single character, only. If this 
occurs after A^'ps pair substitution steps, the corresponding sequence has zero entropy, 
i.e. H{S_^^'^''\ 1) = 0, and 5^(^p=) is called a constant sequence. In Ref. [12], the minimal 
number of pair substitution steps A^ps, needed to transform to a constant sequence, is 
adopted as a measure of algorithmic randomness associated to the initial sequence. As 
an example, consider the sequence S_ = 110010. It consists of six symbols and exhibits 
the maximal single symbol entropy for a binary sequence. A first application of the pair 
substitution routine identifies the most frequent pair e^fp = 10 and thus substitutes the 
respective subsequences by means of the new symbol a2 = 2 (consequently the alphabet 
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is amended to ^ = {0, 1,2}), to yield S^'^^ = 1202 having H{Sj^\ 1) = 1.5. Finally, a 
repeated application of the pair substitution step on S_ terminates after A^ps = 4 steps for 
the constant sequence S^*-^^ = 5. For the slightly modified sequence S_ = 101010 (initially 
also having maximal entropy H{S_, 1) = 1), the NSRPS algorithm terminates after just 
a single pair substitution process, where the constant sequence reads = 222. One 
may now conclude that sequence S_ exhibits a higher degree of randomness than the 
sequence S_, since the NSRPS algorithm requires a larger number of pair substitution 
steps in order to arrive at a constant sequence. This is in accord with intuition, since, 
in contrast to the sequence S_ = 110010, S_ = 101010 exhibits a regular structure. The 
value A^ps tells how well a given sequence might be compressed in terms of the NSRPS 
routine. A small (large) value of A^ps indicates that the sequence is highly (hardly) 
compressible. In accord with Eq. ffTOj) we then define the NSRPS based algorithmic 
entropy rate as 

' ^-^^^ " {NASioo)])- ^^^^ 

Other than for the NSRPS-BE measure, explained in subsection 13.21 the NSRPS- AE 
measure requires no further tuning of a method-specific parameter. 



4. Results 



Using the methods illustrated in the preceding section and for a range of temperatures 
including the critical point, we numerically compute the per-symbol entropies and 
approximate complexities for the 2D Ising Ferromagnet in subsection 14. 1[ In subsection 
14. 2[ we then discuss the finite-size scaling behavior of the observables with respect to the 
system size L. Further, we analyze the finite-size scaling behavior of the approximate 
complexity in the sequence length A^ for the data-compression based estimators NSRPS- 
AE and ZLIB-AE in subsection 14. 3[ In subsection 14.41 we report the results obtained 
for a spin-flip dynamics based on the Wolff cluster algorithm. Finally, in subsection 14. 5[ 
we present the results obtained for the two different dynamics in purely information 
theoretic terms. 



4.1. Numerical results for the entropy rate and approximate complexity 

In Fig. we show the results for the per-symbol entropy obtained by using the 
different estimators introduced earlier. As a benchmark we here consider the curve 
of (/i^f^^[iS(r)]), obtained for 100 independent sequences of length A^ = 10^. For the 
computation of of an individual estimate /?.[f^''[S^] for a given sequences S_, the block size 
was restricted to M < 10. As pointed out in Ref. |33], upon analysis of an ensemble of 
independent finite sequences that stem from the same source, the per-symbol entropies 
associated to the sequences are subject to statistical fluctuations. So as to account for 
the spread of the per-symbol entropy among the 100 independent sequences at each value 
of T, the shaded area in the main plot indicates the difference between the maximal 
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Figure 5. Results for (a) the per-symbol entropy and (b) approximate complexity 
using different estimators for sequences of length N = 10^. The results are averaged 
over a number of 100 independent sequences that characterize the spin-hip dynamics 
of the 2D Ising model simulated via Metropolis dynamics at different temperatures 
T. The dashed line indicates the result for the average value obtained using the 
A'/-block entropy (BE) defined in Eq. The shaded are gives an account for 

the difference between the maximal and minimal values of the entropy rate and 
approximate complexity (in (a) and (b), respectively), obtained using the estimator 
BE. As evident from the main plots, the NSRPS-BE estimator (see Eq. ^) yields 
results in agreement with those of BE. The compression based methods ZLIB-AE and 
NSRPS-AE (Eqs. PT|) and [TH respectively) yield only upper (lower) bounds to the 
true per-symbol entropy (approximate complexity). The inset in subfigure (a) shows 
the difference between the estimates obtained via the NSRPS-BE and BE method in 
units of the standard error for the NSRPS-BE estimator (see text). 



and minimal value of the entropy rates thus obtained. As evident from the figure, the 
entropy rates computed using the NSRPS-BE method for n = 25 compare quite well 
to the benchmark results. In this regard, the inset illustrates the difference between 
the NSRPS-BE and BE measures in units of the standard deviation for the NSRPS-BE 
results, defined as 

e.^^ ^ (/,irS^P^-^^)[g(T)])-(/.r^[g(T)]) 

^ '~ sDev(4™-^^^[^(T)]) • ^ ' 

The inset shows that for T > 2.2, the NSRPS-BE method systematically overestimates 
the benchmark curve. While the deviation increases for T up to a value close to Tc, it 
decreases as T — )■ oo. Note that for the whole range of temperatures considered, the 
numerical estimates obtained using the NSRPS-BE and BE methods satisfy |(5(T)| < 1. 
Further, an analysis for different values of n reveals that as n < 20 (n > 30) and in the 
low (large) T domain it holds that |5(T)| > 1 (not shown). The computation of the 
per-symbol entropy by means of the estimators h^^^^^ and ^^^^ (defined in 
Eqs. (ITT]) and (fT2|) . respectively) are less precise, see Fig. [5](a), and will not be discussed 
further. 

As explained earlier, the approximate complexity associated to a given sequence S_ 
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Figure 6. System size dependence of the per-symbol entropy (main plot) and 
approximate complexity (inset). The sequence length is fixed to = f 0^. (a) shows the 
results obtained using the NSRPS-BE method (data points) and BE method (dashed 
lines) introduced in subsects. 13.21 and l 2.ll respectively, (b) shows the results obtained 
using the NSRPS-AE and ZLIB-AE methods introduced in subsection 13.31 (here, the 
dashed lines are just guides to the eye). In either case, the solid line in the inset 
(without symbols) illustrates the curve 1 — {hfj_[S_{T)]) for L = 64. 



is computed as 

The numerical results, obtained for the approximate complexities considering 100 
independent sequences of length = 10^ are presented in Fig. Mjo). Again, we 
consider the averages obtained by means of the BE method as benchmark to which 
the other methods are compared to. As for the entropy rated considered above, the 
results obtained using the NSRPS-BE method yields an approximate complexity which 
compares well to the BE estimate. However, note that for temperatures T ^ 2.1 the 
NSRPS-BE estimate overestimates the BE estimate systematically. The results obtained 
by means of the estimators c^^'"^^"^^) a^^j ^(nsrps-ae) ((^Qj^^p^i^gfj similar to Eq. ([Hj)) 

only yield a crude lower bound on the approximate complexity, as compared to the more 
sophisticated estimates. 

As defined here, the numerical value of lies in between zero and one and is 
small at low and high temperatures, where the sequences are algorithmically simple and 
random, respectively. At intermediate temperatures, i.e. in the vicinity of the critical 
temperature, long range correlations between symbols in the sequences appear, resulting 
in a comparatively large value of c^. Thus, the quantity behaves as we expect it for a 
quantity indicating "complexity" of a sequence. Note that albeit the magnitude of the 
approximate complexity at a given temperature for the various estimator might differ, 
the peak positions of the different curves are all locate close by the critical temperature 
T, « 2.269. 
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4-2. Finite-size scaling regarding the system size 

An analysis of the per-symbol entropy for sequences of length = 10^, obtained for 
systems of different sizes L = 32 through 256 and for the different estimation methods, 
is shown in the main plots of Figs. |6]^a) and (b). Regarding the elaborate NSRPS-BE 
method, the curves for different system sizes have a common crossing point close to Tc, 
see Fig. |6t^a). At a given temperature below (above) the critical point, increasing L 
results in a smaller (larger) value of the per-symbol entropy. At T — )■ as well as for 
T — )■ oo, the data points for the different system sizes coincide (note that the figure 
only shows a zoom-in on the interval T G [2.2,2.4], enclosing Tc). While it is possible 
to distinguish the data curves for L = 32 and L = 256, it is hard to tell apart the 
curves for L = 128 and L = 256. This might indicate that the data curves at L = 128 
and 256 are reasonable approximations to the thermodynamic limit L oo. For the 
approximate complexity shown in the inset of Fig. |6]^a) we find that its peak gets more 
pronounced as L increases. Further, for the smallest system size, i.e. L = 32, the peak is 
located slightly below Tc, shifting towards a higher temperature as L increases. Again, 
it is hard to tell apart the data curves for L = 128 and 256. The dashed line in the 
inset illustrates the curve 1 — {hn[S_{T)]) for L = 64, which compares well to the scaling 
of {cij.[S.(T)]) as T > Tc (as pointed out in subsection 12.31 this is due to the fact that 
{hn['K[S_{T)]]) ^ 1 for temperatures T > Tc). While similar observations can be made 
for the NSRPS-AE method, the finite-size scaling for the ZLIB-AE method is different. 
As can be seen from Fig. [6](b), data curves obtained using the ZLIB-AE method exhibit 
no crossing point at all. However, still there is the tendency that for temperatures above 
Tc, an increasing system size leads to an increasing value of the per-symbol entropy. 

4-3. Sequence length dependence of the approximate complexity 

A relevant parameter that controls how well the statistics of the spin-flip dynamics is 
captured by the analyzed sequences is the sequence length A^. The longer the sequence, 
the more patterns might be resolved and the better the approximation of the statistical 
properties of the spin-flip dynamics. In this regard, we performed a finite-size scaling 
analysis for the peak-location of the approximate complexity as obtained by the NSRPS- 
AE and ZLIB-AE measures. We did not perform such an analysis for the NSRPS-BE 
method, since, as evident from Fig. HI the convergence properties of the entropy rate 
render it hard to yield reliable results for sequence lengths A^ < 10^. 

Considering the NSRPS-AE estimator. Fig. [Tl^a) indicates that for increasing values 
of A^ the peak-position approaches the critical value Tc from above. The same holds 
also for the ZLIB-AE estimator, see Fig. [7](c). As evident from the figures, the curves 
for the different sequence lengths A^ are peaked at effective critical points Tc^{N) > Tc. 
The peaks get more pronounced as A^ increases. By fitting 5th-order polynomials to 
the data curves in order to estimate the precise location of the peaks (where errorbars 
are obtained via bootstrap resampling [S]), we found that the effective critical points 
exhibit the scaling behavior T^^{N) = T^ + a ■ N~^, see inset of Figs. [7t^a),(c). This 
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Figure 7. Results of the finite-size scaling analysis for the peak-location considering 
the approximate complexity for different sequence lengths N at fixed system size 
L ~ 256. (a) illustrates the data curves obtained by means of the NSRPS-AE method 
and the inset shows the scaling of the associated peak locations, (b) indicates the 
scaling behavior of the finite-size fluctuations for the approximate complexities. The 
subfigures (c) and (d) show the same data as (a) and (b), obtained using the ZLIB-AE 
method. Lines are guides to the eyes only. 



corresponds to standard finite-size scaling near phase transitions [I6] when changing 
the system size L, which is instead kept fixed here. The fit parameters obtained for the 
NSRPS-AE (ZLIB-AE) method read = 2.286(1) and b = 0.50(1) {T^ = 2.289(1) 
and b = 0.52(2)). Note that this exponent is different from the scaling observed when 
varying the system size, where the exponent —l/u = —1 is relevant and related to the 
correlation length exponent u = 1. For increasing sequence length, the approximation 
of the peak-position by means of the 5-th order polynomials gets rather imprecise, 
which might account for the deviation between and the true critical temperature 
Tc. Accordingly, we performed a further analysis considering the NSRPS-AE method 
for sequences up to length = 5000 only (where the peaks can be fit well), resulting 
in the improved estimate = 2.270(5). Note that in any case, the value of is 
reasonably close to the critical temperature, and that these results are obtained at fixed 
L = 256 by varying the length of the sequences. Further, the finite-size fiuctuations 
Xn{T) = N X var(c^[^(T)]), shown in Figs. mh),{d) for NSRPS-AE and ZLIB-AE, 
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Figure 8. Results of the finite-size scaling analysis for the per-symbol entropy 
(main plot) and complexity (inset) considering different system sizes L for a spin-flip 
dynamics based on the Wolff cluster algorithm. The solid line in the inset illustrates 
the curve 1 — {hf_i[S_{T)]) for L = 64. In the figure, data points correspond to the 
estimate obtained using the NSRPS-BE method, while the dashed lines indicate the 
respective results obtained using the BE method. 



respectively, exhibit a peak that also tends towards Tc as increases. Thereby, once 
the sequence length exceeds ^ 5 x 10^, the peaks are located right at the critical 
temperature. These results show that the methods we used to estimate the "complexity" 
of a sequence are not only giving qualitatively satisfying results but can be used for 
rather precise quantitative estimates from finite-size system data. 

4-4- Spin-flip dynamics based on the Wolff' cluster algorithm 

In Fig. |8] we show the results for the entropy rate and approximate complexity, where, 
instead of a single spin-flip MC simulation using Metropolis-dynamics (as above) we 
used the Wolff-cluster-algorithm [35]. The time-unit within these simulation is given 
by a single cluster-construction process (in contrast: using the single spin-flip MC 
simulation for a square lattice of side length L, the time unit consists of a number 
oi L X L independent spin-flip attempts for randomly chosen spins, comprising one 
sweep). The Wolff-cluster-algorithm is most efficient at Tc, yielding binary sequences 
where subsequent symbols are effectively uncorrelated, as reflected by the minimum 
of the approximate complexity, where {c^[S_{Tc)]) assumes a small value close to zero. 
Further, the fact that {hfj_[S_(T)]) exhibits a peak value of nearly one at Tc indicates that, 
at the critical temperature, the sequences appear to be algorithmically random. Hence, 
although the system being simulated is the same, the strong differences in the resulting 
entropy and complexity curves are easily understood by the dynamics of the algorithm. 

The system size dependence of the per-symbol entropy, illustrated in the main plot 
of Fig. El can be understood in terms of the cluster-construction characteristics of the 
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Wolff cluster algorithm. At a temperature T < Tc, a cluster constructed at a given 
time-step comprises almost all spins on the lattice, regardless of the system size (in the 
limit T — )■ 0, a cluster comprises all spins). It is thus very likely that a given spin 
is contained in that cluster and gets flipped frequently. At temperatures T > T^, the 
clusters have some typical size that does not depend on the size of the system (in the 
limit T — oo, a cluster consists of a single spin only). Considering a fixed temperature 
> Tc, the larger the system size L, the smaller the relative size of a (typical) cluster 
appears. Consequently, for a given spin it is less likely to be contained in a cluster as L 
increases and the spin gets flipped only rarely. Thus, the flip-frequency decreases upon 
increasing system size. 

Further, at temperatures below where a given spin flips rather frequently, we 
find {h^[n[S{T)]]) ^ 1. Hence, for T < it holds rather precisely that {c^[S{T)]) ^ 
1 — {h^[S_{T)]). For T > Tc, {hfj,[Ti[S_(T)]]) decreases slightly with increasing temperature 
(not shown). The decrease is monotonous and the observable takes values in between 1 
and 0.8, so we still find that the above relation holds approximately. This is also evident 
by visually inspecting the data curves for the per-symbol entropy and approximate 
complexity displayed in Fig. [HI In the inset, the dashed line illustrates 1 — {h^[S_{T)]) 
for L = 64, which compares well to the respective data curve showing {c^[S_{T)]) . This 
leads us to suggest that for the spin-flip dynamics induced by the Wolff cluster algorithm, 
hfj_ and are trivially correlated. 

In general, one should find that as T — cx3 the results for the dynamics using the 
Wolff cluster algorithm should match those of the single spin-flip Metropolis dynamics. 
However, as a comparison of Figs. [6] and [8] indicates, the results for the per-symbol 
entropies and approximate complexities in the limit T — oo for the different dynamics 
are completely different. This difference is solely due to the definition of a time-unit 
regarding the two spin-flip dynamics. By means of a proper rescaling of the time-unit 
related to the Wolff-cluster-algorithm we verified that the sequences supplied by both 
spin-flip dynamics yield similar results as T — )■ oo. In particular at T = 10.0 and for 
a comparatively small sequence length of = 10'^ we obtained (c^[5(T)]) = 0.022(2) 
and {c^[S_(T)]) = 0.023(7) for Metropolis dynamics and Wolff-cluster algorithm using the 
NSRPS-AE method, respectively (we further checked that for sequence-lengths A^ > 10^ 
the NSRPS-BE method yields approximately the same results). 

4-5. Aproximate Complexity- Entropy diagrams for the different spin- flip dynamics 

In the subsections above, we illustrated the information-theoretic observables "entropy 
rate" and "approximate complexity" as function of the model parameter T. In order to 
account for a completely information-theoretic characterization of the different spin-flip 
dynamics, we illustrate the corresponding (approximate) complexity-entropy diagrams 
in Fig. |9l As evident from the figure, the relation between the information-theoretic 
coordinates depends on the underlying spin-flip dynamics. In case of the single-spin flip 
Metropolis update the dynamics close to the critical temperature suffers from severe 
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Figure 9. Approximate complexity-entropy diagram obtained for a spin- flip 
dynamics based on single-spin Metropolis update and the Wolff cluster algorithm. 
In the analysis, we considered symbolic sequences S_ of length iV = 10^ at different 
temperatures T (the relation between temperature and entropy is illustrated in Fig. 
[5t^a)). In the figure, data points correspond to the results obtained using the NSRPS- 
BE method, while the solid lines indicate the respective results obtained using the 
BE method. The figure characterizes the different dynamics in purely information- 
theoretic coordinates. 

slowing down due to long-range correlations. Off criticality, correlations are less strong. 
Consequently, the approximate complexity (which effectively accounts for correlations) 
is peaked at an entropy value ~ 0.5, reflecting the critical temperature. In terms of 
this very simple update mechanism, the evolution of the 2D Ising FM appears to be 
highly intricate. On the contrary, the elaborate dynamics provided by the Wolff cluster 
algorithm makes the evolution of the 2D Ising FM at the critical temperature maximally 
efficient. It does not suffer from critical slowing down and successive spin configurations 
are effectively uncorrelated. Consequently, the time series related to the orientation of a 
particular spin on the lattice appears to be maximally random at Tc. Hence, the critical 
temperature is reflected by a large entropy rate and low approximate complexity. The 
region of very low entropy rate and very large complexity indicates periodic sequences, 
obtained at small temperatures T where at each time step almost all spins are flipped. 
Further, as pointed out in subsection 14.41 Fig. [9]shows that for the dynamics provided by 
the Wolff cluster algorithm approximate complexity and entropy are trivially related via 
= 1 — h^. Thus, there is no accentuated peak in the approximate complexity-entropy 
diagram. 

5. Summary 

In the presented article we performed numerical experiments to assess the performance 
of three different entropy estimation algorithms that are based on symbolic substitution 
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methods. Binary test sequences where obtained by simulating the 2D Ising FM via 
single spin-flip dynamics at different temperatures T, thereby recording the orientation 
of a single spin. We found that the most elaborate entropy estimation algorithm 
yields results that are in good agreement with those obtained by an information 
theoretic method based on the M-block Shannon entropy. We further proposed and 
analyzed a measure that approximately accounts for the statistical complexity of the 
binary sequences. The respective observable, termed approximate complexity, can 
be understood in terms of information theory. It measures the amount by which 
the entropy rate on the single-symbol level exceeds the asymptotic entropy rate and 
it can easily be computed by means of black-box data- compression algorithms. To 
support intuition on the gross behavior of the approximate complexity note that 
the larger the correlations between the symbols in a given sequence, the larger the 
numerical value of the approximate complexity appears. In the limits of completely 
ordered and fully random symbol sequences it assumes a value of zero. Therefore, the 
approximate complexity behaves as one naively would expected for a quantity measuring 
the "complexity" of a system. For all entropy estimation procedures considered, we find 
that the approximate complexity is peaked at the critical point. Even for the less 
precise entropy estimation algorithms that systematically overestimate (underestimate) 
the entropy rate (approximate complexity), a finite-size scaling analysis in the sequence 
length shows that the peak of the approximate complexity tends towards the critical 
point of the 2D Ising FM as the sequence length increases. Further, qualitative 
differences between the dynamics induced by a single-spin flip Metropolis update and the 
Wolff cluster algorithm are discussed in terms of the information theoretic observables. 

For future work, we plan to apply these methods to systems exhibiting quenched 
disorder, like spin glasses and random-field systems, to find out whether the proposed 
methods will work with similar high efficiency and precision. 

6. Acknowledgements 

OM acknowledges financial support from the DFG [Deutsche Forschungsgemeinschaft) 
under grant HAS 169/3-1. The simulations were performed at the HERO cluster at the 
University of Oldenburg (Germany) which is funded by the German Science Foundation 
(DFG, INST 184/108-1 FUGG) and the Minstry of Science and Cuhure (MWK) of the 
Lower Saxony state. 

References 

[1] C. R. Shalizi and J. P. Crutchficld. Computational Mechanics: Pattern and Prediction, Structure 

and Simplicity. J. Stat. Phys., 104:817-878, 2001. 
[2] D. Loewenstern, H. Hirsh, P. Yianilos, and M. Noordewier. DNA Sequence Classification Using 

Compression-Based Induction. Technical Report DIMACS Tech. Rep. 95-04, DIMACS Center 

- Rutgers University, 1977. 



Numerical entropy estimation and complexity - 2D Ising Ferromagnet 



23 



[3] W. Ebcling. Prediction and entropy of nonlinear dynamical systems and symbolic sequences with 

LRO. Physica D, 109:42-52, 1997. 
[4] A. Baronchclli, E. Caglioti, and V. Loreto. Measuring complexity with zippers. Eur. J. Phys.^ 

page S69, 2005. 

[5] P. Grassberger. Data Compression and Entropy Estimates by Non-sequential Recursive Pair 
Substitution. 2002. 

[6] A. Puglisi, D. Benedetto, E. Caglioti, V. Loreto, and A. Vulpiani. Data compression and learning 

in time sequence analysis. Physica page 92, 2003. 
[7] J. P. Crutchfield and K. Wiesncr. Simplicity and complexity. Physics World, pages 36-38, 

February 2010. 

[8] T. M. Cover and J. A. Thomas. Elements of InformationTheory. Wiley, New York, 2006. 
[9] A. N. Kolmogorov. On tables of random numbers. Sankhya, the Indian Journal of Statistics A, 
25:369-376, 1963. 

[10] G. Chaitin. Algorithmic Information Theory. Cambridge University Press, New York, 1987. 
[11] J. Machta. Complexity, parallel computation and statistical physics. Complexity, 11:46-64, 2006. 
[12] N. Nagaraj, M. S. Kavalekalam, A. Venugopal, and N. Krishnan. Lossless Compression and 

Complexity of Chaotic Sequences, (not published), 2011. A summary of this article is available 

at papercore.org, see http://www.papercore.org/Nagaraj2011. 
[13] P. Grassberger. Toward a quantitative theory of self-generated complexity. Int. J. Theo. Phys., 

25:907-938, 1986. 

[14] J. P. Crutchfield and K. Wiesner. Inferring statistical complexity. Phys. Rev. Lett., 63:105-108, 
1989. 

[15] K. Wiesner, M. Gu, E. Rieper, and V. Vedral. Information-theoretic bound on the energy cost of 

stochastic simulation, 2011. preprint arXiv:1110.4217. 
[16] N. Goldenfeld. Lectures On Phase Transitions And The Renormalization Group. Westview Press, 

Jackson, 1992. 

[17] D.V.Arnold. Information-theoretic Analysis of Phase Transitions. Complex Systems, 10:143-155, 
1996. 

[18] We used zlib version 1.2.3.3, see http:/ /zlib.net. 

[19] W. Ebeling and M. A. Jimenez-Montaiio. On Grammars, Complexity, and Information Measures 

of Biological Macromolecules. Math. Biosci., 52:53-71, 1980. 
[20] M. A. Jimenez-Montano, W. Ebeling, and T. Poeschel. SYNTAX: A computer program to 

compress a sequence and to estimate its information content, (not published), 2002. 
[21] D. Benedetto, E. Caglioti, and D. Gabrielli. Non-sequential recursive pair substitution: some 

rigorous results. J. Stat. Mech., page P09011, 2006. 
[22] L. M. Calcagnile, S. Galato, and Menconi G. Non-sequential Recursive Pair Substitutions and 

Numerical Entropy Estimates in Symbolic Dynamical Systems. J. Nonlin. Sci., page 723, 2010. 
[23] D. P. Feldman, C. S. McTague, and J. P. Crutchfield. The organization of intrinsic computation: 

Complexity-entropy diagrams and the diversity of natural information processing. CHAOS, 

18:043106, 2008. 

[24] J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE Trans. 

Inform. Theory, IT-23:337, 1977. 
[25] T. Bell, I. H. Witten, and J. G. Cleary. Modeling for Text Compression. ACM Computing Surveys, 

21:557, 1989. 

[26] J. P. Crutchfield and D. P. Feldman. Regularities unseen, randomness observed: Levels of entropy 

convergence. CHAOS, 13:25-54, 2003. 
[27] Papercore is a free and open access database for summaries of scientific (currently mainly physics) 

papers, see http://www.papercore.org/. 
[28] T. Schiirmann and P. Grassberger. Entropy estimation of symbol sequences. CHIOS', 6:414, 1996. 
[29] M. E. J. Newman and G. T. Barkema. Monte Carlo Methods in Statistical Physics. Clarendon 

Press, Oxford, 1999. 



Numerical entropy estimation and complexity - 2D Ising Ferromagnet 24 

[30] O. Weiss, M. A. Jimenez-Montano, and H. Herzcl. Information Content of Protein Sequences. J. 

theor. Biol., 206:379-386, 2000. 
[31] J. Wilms, M. Troyer, and F. Verstraete. Mutual information in classical spin models. J. Stat. 

Mech., 2011(10):P10011, 2011. 
[32] I. Erb and N. Ay. Mulit-Information in the thermodynamic Limit. J. Stat. Phys., 115:949, 2004. 
[33] M. A Jimenez-Montano, W. Ebcling, T. Pohl, and P. E. Rapp. Entropy and complexity of finite 

sequences as fluctuating quantities. BioSystems, 64:23-32, 2002. 
[34] A. K. Hartmann. Practical Guide to Computer Simulations. World Scientific, Singapore, 2009. 
[35] U. Wolff. Collective Monte Carlo Updating for Spin Systems. Phys. Rev. Lett, 62:361, 1989. 



